Self-modeling Agents Evolving in Our Finite Universe

نویسنده

  • Bill Hibbard
چکیده

This paper proposes that we should avoid infinite sets in definitions of AI agent and their environments. For agents that evolve to increase their finite resources it proposes a self-modeling agent definition that avoids assumptions about the agent's future form. And it proposes a consistent and complete logical theory for reasoning by AI agents in our finite universe. 1 Finitely Computable Agents According to current physics [1] our universe has a finite information capacity of no more than 10 bits (10 bits excluding gravitational degrees of freedom). Modeling artificial intelligence (AI) agents and environments with infinite sets, such as Peano arithmetic and the infinite tape of a Turing machine, introduces unnecessary theoretical complexity into our understanding of AI. Thus in my recent papers [2, 3] I have replaced the universal Turing machine in Hutter's [4] universal AI with finite stochastic programs (limited to finite memory, for which the halting problem is decidable). Specifically, at each of a discrete series of time steps t  {0, 1, 2, ..., T}, for some large T, the agent sends an action at  A to the environment and receives an observation ot  O from the environment, where A and O are finite sets. Let ht = (a1, o1, ..., at, ot)  H be an interaction history where H is the set of all histories for t ≤ T. The agent's actions are motivated by a utility function u : H → [0, 1] which assigns utilities between 0 and 1 to histories. Future utilities are discounted according to a geometric temporal discount 0 <  < 1. The agent computes a prior probability (h) of history h. The value v(h) of a possible future history h is defined recursively by: v(h) = u(h) +  max aA v(ha) (1) v(ha) = ∑oO ρ(o | ha) v(hao) (2) The recursion terminates with v(ht) = 0 for t > T. The agent (or policy)  is defined to take, after history ht, the action: (ht) := at+1 = argmax aA v(hta) (3) Given a history ht, the agent models the environment by the program [2]: qt = λ(ht) := argmax qQ P(ht | q) (q) (4) Here Q is a prefix-free language for finite stochastic programs, (q) = 2 is the prior probability of program q  Q where |q| is the length of q in bits, and P(ht | q) is the probability that q computes the history ht (this is the probability that the stochastic program q computes the observations oi in response to the actions ai for 1  i  t). Then the prior probability of a possible future interaction history h for use in (2) is: (h) = P(h | qt) (5) 2 Self-Modeling Agents Limited resources are essential to Wang's [5] definition of intelligence and a practical reality for agents in our universe. Although the model λ(ht) in (4) can be finitely computed [3], the resources necessary to compute it grow exponentially with the length of history ht. Furthermore computing the value v(h) of a possible future history h in (1) and (2) requires an expensive recursion. Hence an agent with limited resources must compute approximations. Increasing the accuracy of these approximations will improve the agent's ability to maximize its utility function, and hence the agent will choose actions to increase its computing resources and so increase accuracy. Such self-improvement must be expressible by actions in set A. However, the agents of Section 1 cannot adequately evaluate self-improvement actions. If the agent is computing approximations to the model λ(ht) and to values v(h) using its limited computing resources, then it cannot use those limited resources to compute and evaluate what it would compute with greater resources. In real time interactions between the agent and the environment, the environment will not wait for the agent to slowly simulate what it would compute with greater resources. In the agents of Section 1 values v(ha) are computed by future recursion in (1) and (2). Here we define a revised agent in which values v(ha) are computed for initial subintervals of the current history and in which the environment model includes the computation of such values. Given ht = (a1, o1, ..., at, ot), for i ≤ t define: ov(hi-1ai) = discrete(( ∑i≤j≤t  u(hj) ) / (1 )) (6) Here hj = (a1, o1, ..., aj, oj), discrete() samples real values to a finite subset R of the reals (e.g., floating point numbers) and division by (1 ) scales values of finite sums to values as would be computed by infinite sums. Define o'i = (oi, ov(hi-1ai)) and h't = (a1, o'1, ..., at, o't). That is, values ov(hi-1ai) computed from past interactions are included as observables in an expanded history h't so the model λ(h't) includes an algorithm for computing them: qt = λ(h't) := argmax qQ P(h't | q) (q) (7) Define (h') = P(h' | qt). Then compute values of possible next actions by: v(hta) = ∑rR ρ(ov(hta) = r | h'ta) r (8) Here h't = (a1, o'1, ..., at, o't) and ht = (a1, o1, ..., at, ot). As in (3) define the agent's policy (ht) = at+1 = argmax aA v(hta). Because λ(h't) models the agent's value computations I call this the self-modeling agent. It is finitely computable. There is no look ahead in time beyond evaluation of possible next actions and so no assumption about the form of the agent in the future. λ(h't) is a unified model of agent and environment, and can model how possible next actions may increase values of future histories by any modification of the agent and its embedding in the environment [6]. The game of chess provides an example of learning to model value as a function of computing resources. Ferreira [7] demonstrated an approximate functional relation between a chess program's ELO rating and its search depth, which can be used to predict the performance of an improved chess-playing agent before it is built. Similarly the self-modeling agent will learn to predict the increase of its future utility due to increases in its resources. Utility functions defined in terms of the environment model λ(h't) are a way to avoid the unintended behavior of self-delusion [8, 2]. They are also natural for complex AI agents. Rather than having preprogrammed environment models, complex AI agents must explore and learn models of their environments. But the designers of an AI agent will express their intentions for the agent's behavior choices in terms of their own knowledge of the agent's environment. Thus it is natural that they define an agent's utility function in terms of a procedure to be applied to the agent's learned environment model. I presented an example of such a procedure at AGI-12 [3]. Defining its utility function in terms of its environment model introduces a potential circularity in the self-modeling agent: ov(hi-1ai) depends on u(hj) in (6), u(hj) depends on λ(h't) in defining the utility function in terms of the model, and λ(h't) depends on ov(hi1ai) in (7). This circularity can be avoided by defining the utility function u used in equation (6) in terms of an environment model from a previous time step. The agent's environment model is an approximation because the model is based on a limited history of interactions with the environment and, for agents in our universe, because of limited resources for computing the model. Thus a utility function computed from the model is also an approximation to an ideal utility function which is the true expression of the intention of the agent designers. Such an approximate utility function is a possible source of AI behavior that violates its design intention. 3 Consistent and Complete Logic for Agents Any AI agent based on a logical theory that includes Peano arithmetic (PA) faces problems of decidability, consistency and completeness. Yudkowsky and Herreshoff [9] discuss such problems related to Löb's Theorem for a sequence of evolving agents. Our universe has finite information capacity [1]. I suggest that an agent can pursue its goal in such a finite environment without any need for PA and its theoretical problems. An environment with finite information capacity has a finite number of possible states. In this case it is reasonable to assume a finite limit on the length of interaction histories, that the action set A and the observation set O never grow larger than the information capacity of the environment, and that probabilities and utility function values are constrained to a finite subset of the reals (only a finite subset can be expressed in a finite environment). Then, for a given finite limit on environment size, there are finite numbers of possible objects of these types: environments (expressed as Markov decision processes), histories, utility functions, policies, environment models that optimize equation (4) or (7) for possible histories, and agent programs (assuming agent memory is limited by the information capacity of the environment, there are a finite number of programs and their halting problem is decidable). There are also finite numbers of possible predicates and probability distributions over these types of objects and combinations of them. So, for a given finite limit on environment size, the theory of these types of objects, predicates and probability distributions is decidable, consistent and complete (quantifiers over finite sets can be eliminated, reducing the theory to propositional calculus). In our universe with no more than 10 bits, agents can use this theory to avoid the logical problems of PA. I suggest that more serious problems for agents in our universe are the inaccuracy of their environment models and the limits on their memory capacity and speed for reasoning in real time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ویژگیهای مدیریت اسلامی ریشه ها- الگوها- انگیزه ها

Islam is based on montoheism in every aspect one of the branches of monotheism is Af'ali monotheism. The principle of Af'ali monotheism tells us that there is one single determination dominating the whole existence (universe) everything is under his management and one of his epithets is the Creator of Universe. The Islamic management stems from the management of the creator of universe (The Al...

متن کامل

Potentials of Evolving Linear Models in Tracking Control Design for Nonlinear Variable Structure Systems

Evolving models have found applications in many real world systems. In this paper, potentials of the Evolving Linear Models (ELMs) in tracking control design for nonlinear variable structure systems are introduced. At first, an ELM is introduced as a dynamic single input, single output (SISO) linear model whose parameters as well as dynamic orders of input and output signals can change through ...

متن کامل

X-ray clusters: towards a new determination of the density parameter of the universe

We use a self–consistent modeling of X-ray cluster properties to constrain cosmological scenarios of structure formation in the case of open cosmological models. We first show that an unbiased open model can reproduce present day observations, provided that the density parameter is in the range 0.15−0.4. Although this estimate is derived in a rather different way, it is very close to dynamical ...

متن کامل

Local versus nonlocal barycentric interactions in 1D agent dynamics.

The mean-field dynamics of a collection of stochastic agents evolving under local and nonlocal interactions in one dimension is studied via analytically solvable models. The nonlocal interactions between agents result from (a) a finite extension of the agents interaction range and (b) a barycentric modulation of the interaction strength. Our modeling framework is based on a discrete two-velocit...

متن کامل

رابطه اثربخشی تدریس اساتید با خودکارآمدی پژوهشی و یادگیری خود‌راهبر دانشجویان

Nowadays, updated knowledge and skills and being self directed learners are the prerequisite for success in college and organizational learning. There has been a lot of concern about self directed learning as a key issue in higher education systems. Self directed learning, derived from adult education, is also influenced by other variables like efficiency of faculty teaching and research self e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014